Character encoding and transcoding, character encoding and Transcoding
Note:1. In python2, the default encoding is ASCII, and in python3, the default encoding is unicode.2. unicode is divided into utf-32 (4 bytes), UTF-16 (2 byte
004-python basics-character encoding and transcoding, 004-python Encoding
I. Three encoding methods
ASCII: A computer coding system based on Latin letters. It is mainly used to display modern English and other Western European languages. It can only be expressed in 8 bits (one byte) at most, namely: 2 ** 8 = 256-1. T
This chapter mainly analyzes the principle of Java coding and decoding, and the problems of Chinese transcoding to make a simple summaryDirectory1 Basics of codingISO-8859-1 encodingGBKGB2312UTF-82 Web System conversion encoding principleServlet Network transfer encodingSTRUTS2 Control CodeSpring Control Code3 String bytes4-byte-to- string1 Basics of Coding
ISO-8859-1
This article mainly share with you the implementation of PHP to detect the current character encoding and transcoding methods, combined with text and code, hope to help everyone.
First, detect the current string encoding and change the encoding to Utf-81 Gets the encoding o
remember this statement?1 # Coding:utf8Yes, this is because if the PY2 interpreter to execute a UTF8 encoded file, it will be decoded by default ASCII UTF8, once the program has Chinese, natural decoding error, so we declare at the beginning of the file #coding: UTF8, in fact, is to tell the interpreter, You should not decode this file by default encoding, but instead use UTF8 to decode it. The PY3 interpreter is much more convenient because it is en
Especially for beginners, many times, the development of projects in Eclipse, does not consider the problem of coding format, Eclipse uses the default encoding is GBK, and once the project is completed or midway need to convert to UTF-8 encoding form, often appear a large number of Chinese garbled, This article explains how to implement the conversion of coded formats in a project that has been developed:
PHP automatically recognizes character set encoding and completes transcoding. The principle is very simple, because gb2312gbk is a Chinese byte, the two bytes have a value range, while UTF-8 contains three Chinese characters, and each byte also has a value range. However, the principle of English is simple, because gb2312/gbk is a two-byte Chinese character, which has a value range, while UTF-8 contains th
Descriptionstringmb_convert_encoding( string $str, string $to _encoding[, mixed $from _encoding = mb_internal_encoding () ] )Converts the character encoding of the string type str from an optional from_encoding to a to_encoding. Parameter strThe stringto encode.To_encodingThe type of encoding to which STR is to be converted.From_encodingspecified by the character code name before conversion. It can be an
Php Chinese character transcoding Unicode encoding function
/**
* $ Str original string
* $ Encoding of the original encoding string. the default value is GBK.
* $ Prefix: the prefix after Encoding. the default value is "#"
are Unicode, so only encode need not be decode Unicode.
If you convert the string to GBK encoding:s = "unicode字符串"s_gbk = s.encode("gbk")
If you convert the string to UTF-8 encoding:s_utf8 = s.encode("utf-8")
If you convert a string of GBK format to the UTF-8 format, you need to convert the GBK format to Unicode format and then convert the Unicode to the encoding in UTF-8 format:gbk_to_utf8 = s_gbk.decode("gbk").encode("utf-8")
binary data is represented by the bytes typeOne of the easy-to-misunderstand places:(1) py3 The default file encoding is utf-8, so you can directly write in Chinese, do not need the file header declaration code(2) The variable you declare is Unicode encoding by default, even if your file header declaration code is utf-8, not utf-8, because the default is Unicode.Import SysPrint (Sys.getdefaultencoding ())
lot of 1, is 32 bits, the output low 4 bit, on 255*/ Public classdemo{/** * with GBK encoding UTF decoding, and then with UTF encoding, GBK decoding, so can not turn, because GBK and utf-8 inside are Chinese, but Iso8859-1 could*/ Public Static voidMain (string[] args) throws exception{String s="Unicom"; byte[] B = s.getbytes ("GBK");//GBK Encoding
The default character encoding in Python3 is Unicode and can be encode directly to other encodingsThe default in Python2 is GBK under Windows, all non-Unicode first decode to Unicode, and then encode to other character encodings.Borrowing diagramImportSYSPrint(Sys.getdefaultencoding ())#Display character encodingA_unicode="the end of yuqingping from the wind" #This is the Unicode formatPrint(a_unicode) A_GBK=a_unicode.encode ('GBK')#default is Unicode
Using System.Web; Reference system.web.TextBox2.Text = System.Web.HttpUtility.UrlDecode (TextBox1.Text, System.Text.Encoding.GetEncoding ("GB2312"));// Convert the encoding in the URL to a simplified Chinese characterTextBox2.Text = System.Web.HttpUtility.UrlEncode (TextBox1.Text, System.Text.Encoding.GetEncoding ("GB2312"));// Convert a simplified Chinese character to a URL encodingTextBox2.Text = System.Web.HttpUtility.UrlDecode (TextBox1.Text, Syst
This article summarizes the Python encoding in detail. Share to everyone for your reference, as follows:
"So-called Unicode"
Unicode is an abstract code similar to a set of symbols that specifies only the binary code of the symbol, but does not specify how the binary code should be stored. That is, it is just an internal representation and cannot be saved directly. Therefore, storage needs to specify a form of storage, such as Utf-8 and utf-16. In t
The principle is very simple, because gb2312/gbk is a Chinese byte, the two bytes have a value range, while the Chinese character in UTF-8 is three bytes, and each byte also has a value range. English, regardless of the encoding, is less than 128, only occupies one byte (excluding the full width)
When PHP processes the page, we use iconv or mb_convert functions for character set conversion. However, this is actually a prerequisite. That is, we must kn
PHP detects the current character encoding and transcode 1. detects the current string encoding and changes the encoding to UTF-8
1. obtain the encoding of the current string.
$ Encode = mb_detect_encoding ($ str, array ("ASCII", 'utf-8', "GB2312", "GBK", 'big5 '));
2. change the character
In the PHP processing page, our character set conversion is the use of iconv or Mb_convert functions, but, this is actually a prerequisite. That is, we have to know in advance what the encoding is in and out so that we can make the correct conversion.The following function can automatically determine its encoding and convert it without knowing the source string encoding
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.